# File Descriptions

## 1. `final_all.csv`
This CSV file summarizes the basic information of all molecules.

| Column Name        | Description                                                                 |
|--------------------|-----------------------------------------------------------------------------|
| Index              | Unique identifier of the molecule in the QCDGE database. Prefix indicates source: `Aa`–QM9, `Ab`–GDB11, `Ba`–PubChemQC (≤9 CNOF), `Bb`–PubChemQC (=10 CNOF) |
| HeavyAtomCount     | Number of heavy atoms                                                       |
| RingNumber         | Number of rings                                                             |
| CompoundType       | Type of compound                                                            |
| Smiles_pybel       | SMILES generated via Pybel interface                                        |
| InchI_pybel        | InChI generated via Pybel interface                                         |
| Smiles_rdkit       | SMILES generated via RDKit interface                                        |
| InchI_rdkit        | InChI generated via RDKit interface                                         |
| Smiles_rdkit_can   | Canonical SMILES generated via RDKit interface                              |

## 2. `final_all.hdf5`
Contains all data of the QCDGE database.

### Ground State Properties:
1. **labels** – Atomic labels  
2. **coords** – Optimized Cartesian coordinates  
3. **Etot** – Total energy  
4. **e_homo_lumo** – HOMO and LUMO energies  
5. **polarizability** – Isotropic polarizability  
6. **dipole** – Dipole moment  
7. **quadrupole** – Quadrupole moment  
8. **zpve** – Zero-point vibrational energy  
9. **rot_constants** – Rotational constant  
10. **elec_spatial_ext** – Electronic spatial extent  
11. **thermal** – Thermal properties at 298.15 K  
12. **freqs** – Harmonic vibrational frequencies  
13. **mulliken** – Mulliken charges  
14. **cv** – Heat capacity at 298.15 K  

### Excited State Properties:
1. **Etot** – Ground-state energy  
2. **e_homo_lumo** – HOMO and LUMO energies  
3. **dipole** – Dipole moment  
4. **quadrupole** – Quadrupole moment  
5. **rot_constants** – Rotational constant  
6. **elec_spatial_ext** – Electronic spatial extent  
7. **mulliken** – Mulliken charges  
8. **transition_electric_DM** – Transition electric dipole moments  
9. **transition_velocity_DM** – Transition velocity dipole moments  
10. **transition_magnetic_DM** – Transition magnetic dipole moments  
11. **transition_velocity_QM** – Transition velocity quadrupole moments  
12. **OrbNum_HomoLumo** – Orbital numbers of HOMO and LUMO  
13. **Info_of_AllExcitedStates** – Electronic characters of 10 singlet and 10 triplet excited states  

**Example**: Retrieve HOMO and LUMO orbital numbers for molecule `Bb025418778`:

```python
with h5py.File('final_all.hdf5', 'r') as f:
    print(f['Bb025418778']['excited_state']['OrbNum_HomoLumo'][()])
```

## 3. `A_9.hdf5`
A subset of the QCDGE database containing molecules from the QM9 dataset with fewer than 10 heavy atoms (C, N, O, F).

## 4. `A_10.hdf5`
A subset of the QCDGE database containing molecules from the GDB-11 database with exactly 10 heavy atoms (C, N, O, F).

## 5. `B_9.hdf5`
A subset of the QCDGE database containing molecules from PubChemQC with fewer than 10 heavy atoms (C, N, O, F).

## 6. `B_10.hdf5`
A subset of the QCDGE database containing molecules from PubChemQC with exactly 10 heavy atoms (C, N, O, F).

> 🔍 **Note**: In these subset files, the index is de-prefixed. For example, `Bb025418778` can be accessed in `B_10.hdf5` using the key `025418778`.

## 7. `SHA512SUM`
Contains SHA-512 hash strings for all HDF5 files.

## 8. `extract_data.py`
A script for extracting molecular properties from the QCDGE database, supporting multiple options.

### Usage:
1. **Choose a filtering method** by entering a number:
   - `1` – All molecules
   - `2` – A list of molecular indices (provide the list using the `mol_list` keyword in the code)
   - `3` – Filter by heavy atom number and/or element types
   - `9` – Help for properties

2. **Select specific properties**:  
   Enter properties separated by the Enter key, ending with `#`. Use `*` to include all properties.  
   Enter `9` in the first step to view property numbers.

## 9. `README`
Additional details about accessing and retrieving data from the QCDGE database.
